GPU Architecture

A warp contains multiple thread processors (typically 32/64). All processors in a warp run the same code simultaneously.

Each core has some memory allocated for both L1 cache and shared memory. Each core contains 4 processing blocks (which can run a warp each).

A dispatched workgroup may run on multiple warps.

All cores share an L2 cache.

Related

Modal’s GPU Glossary

Created 9/14/2025
Tended
  • 9/14/2025
  • 6/21/2025